94 research outputs found

    Evaluation of SNMP-like protocol to manage a NoC emulation platform

    No full text
    International audience—The Networks-on-Chip(NoCs) are currently the most appropriate communication structure for many-core embedded systems. AnFPGA-based emulation platform can drastically reduce the time needed to evaluate a NoC, even if it is composed by tens or hundreds of distributed components. These components should be timely managed in order to execute an evaluation traffic scenario. There is a lack of standard protocols to drive FPGA-based NoC emulators. Such protocols could ease the integration of emulation components developed by different designers. In this paper, we evaluate a light version of SNMP (Simple Network Management Protocol) to manage an FPGA-based NoC emulation platform. The SNMP protocol and its related components are adapted to a hardware implementation. This facilitates the configuration of the emulation nodes without FPGA-resynthesis, as well as the extraction of emulation results. Some experiments highlight that this protocol is quite simple to implement and very efficient for a light resources overhead

    Adaptive FPGA NoC-based Architecture for Multispectral Image Correlation

    Full text link
    An adaptive FPGA architecture based on the NoC (Network-on-Chip) approach is used for the multispectral image correlation. This architecture must contain several distance algorithms depending on the characteristics of spectral images and the precision of the authentication. The analysis of distance algorithms is required which bases on the algorithmic complexity, result precision, execution time and the adaptability of the implementation. This paper presents the comparison of these distance computation algorithms on one spectral database. The result of a RGB algorithm implementation was discussed

    Task mapping and mesh topology exploration for an FPGA-based network on chip

    No full text
    International audienceTask mapping strategies on NoC (Network-on-Chip) have a huge impact on the timing performance and power consumption. So does the to-pology. In this paper, we describe the exploration flow of task mapping algorithms using different NoC mesh shapes. The flow is used to evaluate timing and energy consumption based on a NoC emulation platform. It is open to any task mapping algorithms and to any shapes of NoC mesh. A heterogeneous (PC and FPGA) platform is used to fully perform each step of the flow. The experiments demonstrate that the most appropriate task mapping strategy and the most suitable NoC shape strongly depend on the algorithm used. Depending on the timing latency results obtained and the FPGA resources used, the designer can select the appropriate task mapping strategy on the suitable shape in a short exploration time and with precise timing evaluation

    Exploration d'architectures génériques sur FPGA pour des algorithmes d'imagerie multispectrale

    Get PDF
    Les architectures multiprocesseur sur puce (MPSoC) basées sur les réseaux sur puce (NoC) constituent une des solutions les plus appropriées pour les applications embarquées temps réel de traitement du signal et de l image. De part l augmentation constante de la complexité de ces algorithmes et du type et de la taille des données manipulées, des architectures MPSoC sont nécessaires pour répondre aux contraintes de performance et de portabilité. Mais l exploration de l espace de conception de telles architectures devient très coûteuse en temps. En effet, il faut définir principalement le type et le nombre des coeurs de calcul, l architecture mémoire et le réseau de communication entre tous ces composants. La validation par simulation de haut niveau manque de précision, et la simulation de bas niveau est inadaptée au vu de la taille de l architecture. L émulation sur FPGA devient donc inévitable. Dans le domaine de l image, l imagerie spectrale est de plus en plus utilisée car elle permet de multiplier les intervalles spectraux, améliorant la définition de la lumière d une scène pour permettre un accès à des caractéristiques non visibles à l oeil nu. De nombreux paramètres modifient les caractéristiques de l algorithme, ce qui influence l architecture finale. L objectif de cette thèse est de proposer une méthode pour dimensionner au plus juste l architecture matérielle et logicielle d une application d imagerie multispectrale. La première étape est le dimensionnement du NoC en fonction du trafic sur le réseau. Le développement automatique d une plateforme d émulation sur mono ou multi FPGA facilite cette étape et détermine le positionnement des composants de calcul. Ensuite, le dimensionnement des composants de calcul et leurs fonctionnalités sont validés à l aide de plateformes de simulation existantes, avant la génération du modèle synthétisable sur FPGA. Le flot de conception est ouvert dans le sens qu il accepte différents NoC à condition d avoir le modèle source HDL de ce composant. De nombreux résultats mettent en avant les paramètres importants qui ont une influence sur les performances des architectures et du NoC en particulier. Plusieurs solutions sont décrites, commentées et critiquées. Ces travaux nous permettent de poser les premiers jalons d une plateforme d émulation complète MPSoC à base de NoCThe Multiprocessor-System-On-Chip (MPSoC) architectures based on the Network-On-Chip (NoC) communication are the one of the most appropriate solution for image and signal processing applications under real time constraints. Due to the ever increasing complexity of these algorithms, the types and sizes of the data manipulated, the MPSoC architectures are necessary to meet the constraints of performance and portability. However exploring the design space of such architecture is time consuming. Indeed, many parameters should be defined such as the type and the number of processing cores, the memory architecture and the communication network between all these components. Validation by high-level simulations has the lack of the precision. Low-level simulation is inadequate for such big size of the architecture. Therefore, the emulation on FPGA becomes inevitable. In image processing, spectral imaging is more and more used. This technology captures light from more frequencies than the human eye increasing the number of wavelengths. Invisible details can be extracted from a scene. The difference between all spectral imaging applications is the number of wavelengths and the precision. Many parameters affect the characteristics of the algorithm, having a huge impact on the final architecture. The objective of this thesis is to propose a method for sizing one of the most accurate hardware and software architecture for multispectral imaging application. The first step is the design of the NoC based on the network traffic. The automatic development of an emulation platform on a single FPGA or multi-FPGAs simplifies this step and determines the positioning of the computational components. Then, the design of computational components and their functions are validated using existing simulation platforms. The synthesizable model of the architecture on FPGA is then generated. The design flow is open. Several NoC structures can be inserted using the source model of this component. The set of results obtained points out the major parameters influencing the performances of architecture and the NoC itself. Several solutions are described and analyzed. These studies allow us to lay the groundwork for a complete MPSoC emulation platform based on NoCST ETIENNE-Bib. électronique (422189901) / SudocSudocFranceF

    Federated learning compression designed for lightweight communications

    Full text link
    Federated Learning (FL) is a promising distributed method for edge-level machine learning, particularly for privacysensitive applications such as those in military and medical domains, where client data cannot be shared or transferred to a cloud computing server. In many use-cases, communication cost is a major challenge in FL due to its natural intensive network usage. Client devices, such as smartphones or Internet of Things (IoT) nodes, have limited resources in terms of energy, computation, and memory. To address these hardware constraints, lightweight models and compression techniques such as pruning and quantization are commonly adopted in centralised paradigms. In this paper, we investigate the impact of compression techniques on FL for a typical image classification task. Going further, we demonstrate that a straightforward method can compresses messages up to 50% while having less than 1% of accuracy loss, competing with state-of-the-art techniques

    Evaluation and Design Space Exploration of a Time-Division Multiplexed NoC on FPGA for Image Analysis Applications

    Get PDF
    The aim of this paper is to present an adaptable Fat Tree NoC architecture for Field Programmable Gate Array (FPGA) designed for image analysis applications. Traditional NoCs (Network on Chip) are not optimal for dataflow applications with large amount of data. On the opposite, point to point communications are designed from the algorithm requirements but they are expensives in terms of resource and wire. We propose a dedicated communication architecture for image analysis algorithms. This communication mechanism is a generic NoC infrastructure dedicated to dataflow image processing applications, mixing circuit-switching and packet-switching communications. The complete architecture integrates two dedicated communication architectures and reusable IP blocks. Communications are based on the NoC concept to support the high bandwidth required for a large number and type of data

    Intégration rapide de services vidéo Mpeg sur architectures parallèles

    Get PDF
    Le temps réel pour des applications audiovisuelles est une forte contrainte qui nécessite la mise en oeuvre de plates-formes constituées de plusieurs unités de calcul. Le but de nos travaux est de développer un processus de prototypage rapide sur des architectures parallèles pour des applications de traitement d'image. Le processus de prototypage débute par la description des algorithmes grâce à une interface visuelle de programmation orientée objet. Cette description est ensuite transformée automatiquement pour pouvoir être utilisée par Syndex, un logiciel permettant d'évaluer et de générer l'ordonnancement des tâches de l'algorithme sur des architectures multiprocesseurs. Nous démontrons ici l'efficacité de notre méthodologie avec les développement d'une application Mpeg-2 conséquente et son implantation multi-DSP

    An adaptive embedded architecture for real-time Particle Image Velocimetry algorithms

    No full text
    International audienceParticle Image Velocimetry (PIV) is a method of im-aging and analysing fields of flows. The PIV tech-niques compute and display all the motion vectors of the field in a resulting image. Speeds more than thou-sand vectors per second can be required, each speed being environment-dependent. Essence of this work is to propose an adaptive FPGA-based system for real-time PIV algorithms. The proposed structure is ge-neric so that this unique structure can be re-used for any PIV applications that uses the cross-correlation technique. The major structure remains unchanged, adaptations only concern the number of processing operations. The required speed (corresponding to the number of vector per second) is obtained thanks to a parallel processing strategy. The image processing designer duplicates the processing modules to distrib-ute the operations. The result is a FPGA-based archi-tecture, which is easily adapted to algorithm specifica-tions without any hardware requirement. The design flow is fast and reliable

    Architecture/algorithm joint exploration of Hough transform on FPGA

    No full text
    International audience—The Hough transform is a robust algorithm used in shapes and objects recognition and it proves a high quality of results. This algorithm is also known for its use of large computing power. This paper presents a dedicated accumulator cache to optimize Hough space access. We explored the design space by evaluating the locality of accesses and found up to a speedup of 30 with a 2-line cache
    • …
    corecore